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Abstract. We propose a novel method to accurately reconstruct a set of images representing a single scene 
from few linear multi-view measurements. Each observed image is modeled as the sum of a background image and a 
foreground one. The background image is common to all observed images but undergoes geometric transformations, 
as the scene is observed from different viewpoints. In this paper, we assume that these geometric transformations 
are represented by a few parameters, e.g., translations, rotations, affine transformations, etc.. The foreground images 
differ from one observed image to another, and are used to model possible occlusions of the scene. The proposed 
reconstruction algorithm estimates jointly the images and the transformation parameters from the available multi-view 
measurements. The ideal solution of this multi-view imaging problem minimizes a non-convex functional, and the 
reconstruction technique is an alternating descent method built to minimize this functional. The convergence of the 
proposed algorithm is studied, and conditions under which the sequence of estimated images and parameters converges 
to a critical point of the non-convex functional are provided. Finally, the efficiency of the algorithm is demonstrated 
using numerical simulations for applications such as compressed sensing or super-resolution. 

Keywords, inverse problem, image registration, non-convex optimization. 



1. Introduction 

Multi-view imaging has become more and more popular in the signal processing community during 
the past years with, for example, the design of camera arrays [T 43 and the rise of new applications 



such as 3D reconstruction of a scene 19 36 



In most multi-view imaging applications, the pose of the cameras, or of the sensing systems, is 
estimated by pre-calibration or by using side-information. In this paper, we study the problem of 
reconstructing a set of images, representing a single scene, from few measurements made at different 
viewpoints in absence of (or with inaccurate) prior pose estimation. This situation may occur in 
different scenarios. 

For example, let us imagine that we have a camera that can be sent up to take pictures of the ground 
and that we are interested in obtaining higher resolution images from these acquisitions. Without 
more knowledge on the acquisition strategy, we can treat each image independently using a single- 
frame super-resolution technique. However, if we know that the duration between the capture of two 
images is not too long, it is highly probable that parts of the scene visible in one frame remain visible 
in the subsequent ones. Consequently, we can surely benefit from the high inter-correlation between 
observations to obtain better super-resolved images by jointly reconstructing several frames together, 



as in, e.g., 16 41 . The inter-correlation may be modeled using geometric transformations, which 
depend on the camera pose, that register the images with respect to each other. In the absence of any 
side information, these transformations have to be estimated along with the high resolution images 
during the reconstruction process. 

As another example, let us consider the problem of designing a similar camera as above but with 
strong power consumption constraints. To design such an energy-efficient system, compressed sensing 



(CS) is a powerful tool 12 22 . Indeed, this theory states that sparse signals can be sampled with 
just a few linear and non-adaptive measurements. In scenarios where signals related to common 
phenomena are acquired, it is actually possible to reduce even further the number of measurements by 
jointly reconstructing the signals with a method exploiting the inter-correlation between them |6||15]. 



A technique, as in 23 42 , able to estimate both the geometric transformations between the observed 
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images and the images themselves from the compressed measurements can thus help us to meet our 
power consumption constraints. 

In the present work, we propose a novel method that estimates jointly a set of correlated images 
and the geometric transformations that align these images on each other. This technique also robustly 
handles the appearance of new objects in the scene and can be applied in different settings, such as 
the ones described above. 

1.1. Problem formulation 

Let u = (ui,v,2) G R 2 represent the Cartesian coordinates of a point on the Euclidean plane, and 
G = {iik}i^k^n be a square uniform grid of n points. We model continuous images as functions 
x: R 2 — > R living in a space {x(u) = Y^i=i Xi l f'( u ~ x i G ^ Ui £ G, i = 1, . . . , n}, where 
(p : R 2 — > K is a generating function [40j. For simplicity, we suppose that ip(uk — it,:) = 5{uk — Ui) for 
all Uk,Ui 6 G, so that the discrete image x € R™ obtained by sampling x on the grid G has samples 
X\ , . . . , x n . 

In our setting, several observers provide different observations y\, . . . ,yi € R m , with m ^ n, of a 
scene from different viewpoints. As a first approximation, one can consider that these observations 
describe a single image xq, referred as the background image hereafter. This image is "viewed" from 
different perspectives and thus undergoes geometric transformations. We consider here that these 
transformations are unknown and need to be estimated. However, we restrict ourselves to global 
transformations represented by few parameters, such as translations, rotations, or affine transforma- 
tions. For each observer j, with j = 1, these transformations are modeled using a function 
T0, : M 2 — > R 2 , depending on q parameters Oj £ K 9 , that maps the coordinates u into new coordinates 
Tg j (it) . The background image transformed by Te j is thus £o ° T 9j ■ 

To complete the model, we also take into account possible occlusions of the scene. These occlusions 
may obviously be different from one observer to another. We model them here using I foreground 
images x\, . . . , Xi and write the j observed image as x o tq. + Xj. 

Finally, we assume that the observations yj are obtained by linear projection of xqoto i + xj onto m 
known functions et{, . . . , : M 2 — > 1R. These functions model the acquisition system with which the 
images are observed. They can represent a blurring operator for a deconvolution problem, or random 
waveforms in a compressed sensing setting. The i th entry of yj satisfies 



where xi and x 3 k , with k = 1, . . . , n, are the samples describing xq and Xj respectively. To facilitate the 
implementation of this observation model, we assume that the grid G has a sufficiently high resolution 
so that the integral above is "well" approximated by its Rlemann sum on this grid. In the following, 
we consider that 



where we supposed that the pixels of the grid have a unit square area. Therefore, the measurements 
vector yj is equal to Aj (5(0j)x Q + Xj), where Aj = (al(u k )) lk <E R mxn , and S: R g -> R nxn satisfies 
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Concatenating all the observations in a single vector, we have 

' Vl 1 [" AiS(0i) A, ... 
(1.2) : = : ; i 

... A> 



A,S(0 ; ) 

where we introduced I vectors rax, . . . , n; € 





' a; " 


+ 






x l 







to model additive measurement noise. Our goal is to 



design a method that estimates the discrete images x = (xq : . . . , xj) 1 € ]R(' +1 '", and the transforma- 
tion parameters 9 1 = (Oj, . . . ,8]) J G M. lq , using y = (yj, . . . ,yJ) J £ R lm as sole information. 

The inter-correlation between the observed images appears explicitly in the measurement model 
(1.2). We consider that the observed images share a common signal - the background image - which 
lives on a low-dimensional manifold. In the literature, other models that exploit the inter-correlation 
between the measurements have been proposed. 



1.2. Related works 

The problem of reconstructing an image from multi-view measurements has been studied by several 
authors in contexts such as compressed sensing, super-resolution, or robust image alignment. We try 
here to give a brief overview of the different types of techniques used. 

When the measurements are obtained by compressed sensing, some techniques, like ours, rely on the 



estimation of geometric transformations between images. In 13 , the authors propose to reconstruct 
the background image using an over-complete dictionary created by scaling, translating and rotating 
several times a mother waveform. The image is assumed to have a sparse decomposition in this dictio- 
nary, and the reconstruction is done by using the observations y% , . . . , y\ one after another. We note 
that this method does not deal with the problem of occlusions, and that, as mentioned by the authors, 
the quality of the final reconstruction depends on the order in which the measurements are taken. 



In 23 42 , the authors suppose that the observed images live exactly on a low-dimensional manifold. 
Let us remark that it would be the case in our model in absence of occlusions. For the reconstruction, 
the problem is separated into two separate tasks: registration and image estimation. The authors use 
a manifold lifting algorithm for the registration and standard compressed sensing algorithms for the 
image estimation. We note that to work efficiently, the manifold lifting algorithm requires the use of a 
specific measurement matrix, implemented with noiselets, to estimate the transformation parameters 
in a coarse-to-fine scales fashion. 

Continuing with compressed sensing, other techniques rely on more general interpolation model 
between images. In [37||38| , the images are first estimated with a bidirectional interpolation using two 
known neighbor images and then corrected using the acquired measurements. An iterative process, 
alternating between interpolation and reconstruction, is then proposed to refine the estimation of the 
images. Note that a similar idea is used in [2] for dynamic MRI. In 21 , a slightly simpler model is 
used. The authors assume that the difference between two images is sparse in a wavelet basis and 
that the background image is sparse in the gradient domain. Then, a convex minimization problem is 
solved to reconstruct jointly all the images. In 14p7 , the interpolation relies on "disparity maps" that 
indicate the correspondences between the pixels of two images. These maps are used to estimate the 
images from the measurements, and the newly estimated pictures can be used to refine the disparity 
maps. The process is then iterated several times. Let us remark that the authors do not provide any 
proof of convergence. We note that the authors deal with the problem of occlusions by supposing that 
they are sparse in image space. Let us also mention that a similar idea is proposed in 35 . Finally, 
in 30 , the authors use a linear dynamical system to model the intercorrelation between frames, and 
reconstruct a video sequence. 

When the matrices Ai, . . . , A m implement a blurring and downsampling operator, one can try to 
obtain a high-resolution background image from the images observed at low-resolution by solving 
(1.2). Super-resolution from multiple frames has also been widely studied and several image priors 



and correlation models between images have been proposed to solve this problem. As before, some 



techniques model the correlation between images using geometric transformations 16 41 while other 



ones use more general correlation models 26 32 . 
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Finally, let us highlight a similarity between our model and the one used in 24 25 for robust 
image alignment, which consists in aligning a set of images despite the presence of large occlusions. 
With the measurement model (1.2 1, one can attempt to solve this problem by initializing the vectors 
t/i, . . . , yi with the observed images and the observations matrices Ai, . . . , A; with the identity. Then, 
the algorithm should estimate the background image, the transformation parameters that align this 



background image on the observed ones, and the foreground images modeling occlusions. In 24 25 
a different approach is proposed. The authors concatenate the initial images yi,...,yi in a matrix 
and explain that, after alignment of these images, this matrix can be decomposed as the sum of a 
low-rank matrix (modeling correlated components) and a sparse matrix (modeling occlusions). This 
model serves as the criterion to align the images. Note that in the limit where the rank of the final 
matrix modeling the correlated components is 1, their method, like ours, also decomposes the initial 
images into a single background image and / foreground images. 



1.3. Main contribution and organization of the paper 

We propose here a novel algorithm for joint registration and reconstruction of a set of misaligned 
images, and, unlike the related methods presented in the previous section, study the convergence of 



the algorithm. Our method relies on the measurement model (1.2 1, and is robust to measurement 
noise and occlusions. The technique can be applied in different scenarios such as compressed sensing, 
super-resolution, or robust image registration. 

In Section [2j we identify the solution of the multi-view imaging problem with the global minimizer 
of a non-convex functional, and propose an algorithm for the minimization of this functional. In 
Section [3j we study the convergence of the reconstruction method, and prove that the sequence of 
estimates (images and transformation parameters) converges to a critical point of the functional. The 
efficiency of the algorithm is then demonstrated experimentally in Section [4] for several problems. In 
the spirit of reproducible research, our codeQ and data needed to reproduce the results presented in 
this paper can be downloaded openly at http://lts2www.epfl.ch/people/gilles/softwares. We 
finally conclude in Section [5] Some results required to prove the convergence of the algorithm are 
gathered in Appendix. 

Note that we already briefly studied the problem of reconstructing a set of images from multi-view 



measurements in 27 , where the measurement model (1.2) was also used. However, the technique 



proposed to reconstruct the images was based on the Bregman iterative regularization procedure, 



which is different from the one used here. Furthermore, the convergence results presented in 27 
essentially concern the decrease of the objective value and, unlike here, do not ensure the convergence 
of the sequence of estimated images and parameters. 



1.4. Notations and definitions 

The Euclidean scalar product of R n is denoted (•,•) and || • ||2 is the corresponding ^-norm. The 
£i-norm of a vector x = € K n is defined as ||a;||i = Y^i=i \ x i\- 

Let C be a closed convex subset of W 1 . The indicator function of C is denoted ic ■ K" — > RU {+oo}. 
It is a proper lower semicontinuous convex function that satisfies 

. , s f 0, if x G C, 
ic(x) = { ,, 

y +oo, otherwise. 

Let /: R™ — s- R U {+00} be a proper lower semicontinuous function. The domain of / is denoted 
and defined by dom/ = {x e R™ : f(x) < +00}. The subdifferential of / at x g dom/ is denoted by 
df(x) (see, e.g., Section 2 of [3] for the definition). Note that df(x) = if x ^ dom/. A necessary 
(but not sufficient) condition for x* <E 1™ to be a minimizer of / is df(x*) 3 0. A point that satisfies 
the last condition is called a critical point of /. 

Finally, C k denotes the set of functions that are k times continuously differentiable. 



1 We used the SPARCO toolbox available at 



http : / /www . cs .ubc . ca/labs/scl/sparco/ 9 
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2. A non-convex optimization technique 

In this section, we first identify the solution of the compressed multi-view imaging problem with the 
minimizer of a non-convex functional and propose an algorithm to minimize this functional. 

2.1. Solution as minimizer of a non-convex functional 

Let e J? be an upper bound on the energy of the measurement noise, i.e., ||n||2 ^ £• Our objective is 
to find a set of parameters 9* and images x* that satisfy the constraint ||A(0*) x* — y\\2 ^ e, where 

AiS(0i) Ai ... 



A(0) = 



• Imx (l+l)n 



AjS(e,) o 

As infinitely many solutions satisfy this constraint, prior information is needed to restrict the set of 
admissible solutions. To regularize this ill-posed inverse problem, we can, for example, search for 
images sparse in a wavelet basis by minimizing the .t?i-norm of their wavelet coefficients, or seek a 
solution that minimizes the Total Variation (TV) norm if the images are piecewise constant [29] . We 
may also require the transformation parameters 9j, with j — I,..., I, to belong to some convex sets 
Qj = {Oj eR q : Oj < 0-j < 9j} where Oj € R q and 9j € R q are pre-defined upper and lower bound^J 
Therefore, an estimate of the images and transformation parameters can be obtained by solving a 
minimization problem of the form 

(2.3) mxnf(x)+K\\A(e)x-y\\l subject to 9 e 6, 

where / : R(' +1 ) n -> MU{+oo} is a proper lower-semicontinuous convex function, k > is a regularizing 
parameter that should be adjusted with the noise level e, and 9 = {9 = (9J, . . . ,9J) J G R lq : 9j £ 



Problem (2.3) is, in general, non-convex and finding a global minimizer is not trivial. Nevertheless, 
one can still try to find a local minimum using an alternating minimization technique such as the block 
coordinate descent method 39 . Recently, Attouch et al. proposed in [3] a new type of alternating 



algorithm for the minimization of a non-convex functional and, more importantly, managed to prove 
that the sequence generated by the algorithm converges to a critical point of the functional for a 
wide class of problems. As said by the authors, their algorithm can be interpreted as a proximal 
regularization of the Gauss-Seidel method where cost-to-move functions are added in the minimization 
procedure. These convergence results were then generalized to inexact descent methods that satisfy a 
sufficient decrease condition (31 . Based on this work, we develop a minimization method for problem 



(2.3) and prove that the generated sequence converges to a critical point {x* ,9*) of the functional 
L : R('+i)nxi 9 ^ Ru { +0O } defined as 

(2.4) L(x, 9) = f(x) + K || A(0) x-y\\l+ i e (9). 

We recall that this condition is necessary for (x*,0*) to be a global minimizer of Problem ( |2.3| ) but 
is, unfortunately, not sufficient. 

The proposed algorithm is a descend method, i.e., it generates a sequence of estimates (x k ,9 k )ken 
such that L(x k+1 , 9 k+1 ) ^ L(x k ,9 k ) for all k £ N, and consists of two steps. We start by describing 
each of these steps and study the convergence of the algorithm in Section [3| 

2.2. First step of the algorithm 

Let (x k , 9 k ) € ]R( / + 1 ) n x be the estimates obtained after k iterations of the algorithm. The first step 
consists in finding a new estimate x k+1 g JRC+ 1 )™ that decreases the value of the objective function L 
while keeping 9 k fixed. Let us choose x k+1 as a solution of 

(2.5) mm f(x) + k \\k(9 k ) x - y\\j + ^ g ( x - x % 



2 Let = (0i)i^ 9 G R q , 6 = e K 9 , 9 sg means that 9 t sg 0, for all i e {1, . . . , q}- 
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where X x > 0, and g: R( /+1 )™ — > R i s pr oper lower-semicontinuous convex function such that g(x) ^ 
for all x E R(' +1 K It is clear that implies 

(2.6) L{x k+ \9 k ) + y g(x k+1 - x k ) ^ L(x k ,O k ). 

Hence L(x k+1 7 9 k ) < L{x k ,6 k ) with a decrease of \ x g(x k+1 — x k )/2, at least. Note that the above 
minimization problem is convex in x and that it can be solved using, e.g., the algorithms presented 

In Problem (2.5|, the function g acts as a proximal term and X x as a stepsize. It ensures a 
sufficient decrease of the functional L at each iteration and is essential to prove the convergence of 
the sequence (x k , k )keN to a critical point. This function also provides, to some extent, a control on 
the evolution of the generated sequence, and we realized that it must be chosen carefully to achieve 
good reconstructions. Note that in the work [3] of Attouch et al, the function g is a squared ^ 2 -norm. 

From the multi-view measurements y\ , . . . , yi , the algorithm should reconstruct the background 
image, separate the occlusions (foreground), and provide the transformation parameters that register 
the initial images between them. This problem is solved here by alternatively refining the estimation 
of the images x (background and foreground) and of the transformation parameters 6 (image registra- 
tion) . We noticed that to improve the quality of the registration, it is better to have a reconstruction 



of the images in a coarse-to-fine scales fashion, as in 23 42 . In these papers, the authors use a dedi- 



cated measurement matrix that allows this type of reconstruction. In this paper, this goal is achieved 
by choosing a function g that favors the reconstruction of the coarse scales first and then of the fine 
scales. There is thus no constraint on the choice of the measurement matrices anymore. 

In order to have a coarse-to-fine scales reconstruction of the images, we use a wavelet tight-frame 
W € R™ xp , with p > n, i.e., WW T is equal to l„ € W ixn , the identity matrix. One should remark 
that most of the largest wavelet coefficients of a natural image live at the coarsest scales and that the 
amplitude of these coefficients decreases when going to finer scales (expect at the few singularities) . 
To achieve our goal, we need a function g that favors a reconstruction of the coarse scales first, by 
selecting the biggest wavelet coefficients, and then of the fine scales, by selecting the smallest wavelet 
coefficients. Let M> g k('+ 1 )™ x ( ; + 1 )p be the block-diagonal matrix built by repeating I + 1 times the 
matrix W on the diagonal. We have v|/\J/t = \n+i) n - Take x k+1 as a minimizer of f(x) + n \\k(0 k ) x — 
2/ 111 + \\^ J (x — x k )\\ i/2, and remember that the ^i-norm favors solutions with few large coefficients. 
Hence, the function / forces x k+1 to fit our prior, the quadratic term forces x k+1 to be consistent with 
the observations, and the cost-to-move function x H > \\H> J (x — a; fc )||i imposes that x k+1 differs from x k 
by only a few large wavelet coefficients. Starting from x° = £ at k = 0, the first estimated 

images x 1 have thus a sparse decomposition in the wavelet frame W. The sparsity increases with the 
stepsize parameter X x . As we are reconstructing natural images, these few large coefficients mainly 
live at large scales and x 1 is a coarse approximation of the solution. The estimation of the images 
is then refined at the next iteration, and the cost-to-move function favors the selection of the next 
biggest wavelet coefficients living at finer scales. We thus have a coarse-to-fine scales reconstruction 
of the images. This behavior will be illustrated in Section [4] 

Note that to be able to prove the convergence of the sequence (x k ,6 k )ken to a critical point of 
L with the results presented in 3,4 , we actually substitute the Huber function for the ^-norm as 
cost-to-move function. The Huber function is a smooth approximation of the £i-norm that depends 
on a smoothing parameter /i > 0. This parameter can be chosen small so that both functions are 
nearly identical. Let a = (aj)i^j^ p € R(' +1 ) p , the Huber function : M^ 1 ^ — > K satisfies 

(Z+i)p 



i=X 



where 



hi = \ if ' a * l<M ' Vie {!,... ,(l + i)p}. 

1 \ati\ + fi/2, otherwise, L v ' ' 



From now on, the term g{x — x k ) in Problem (2.5) is replaced by h^(ty J (x — x k )). 
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2.3. Second step of the algorithm 

The second step consists in updating the transformation parameters to further decrease the value of 



the objective function L. As the functions 9 i-> \\A(9 ) x — 3/H2 and iq are separable in 9j, 
j = 1, . . . , I, we optimize the transformation parameters separately for each observations. 



ith 



Let (x k ,9 k 



p(Z+l)ra 



x 6 be the es tima tes obtained after k iterations of the algorithm and 
x k+i g kg solution of problem ( 2.5 1 . To simplify the notations, we introduce new functions 

Q k+1 : Ri - 



with j — 1 , . . . , I , defined as 



(2.7) 



fe+i 



v\\l 



Note that ||A(0)jc fc+1 - y\\\ = ^=1 Qj +1 ( d j) and th at i (9) = ie^dj). We choose to update 
the transformation parameters with the following projected Newton-like method 10 31 . 

Let us assume that the entries of the matrix S(9j) are differentiable with respect to the transforma- 
tion parameters. The first order Taylor expansion of S(0j)x^ +1 at d\ is S(9 k )x^ +1 + J(0^)(0j - 6$) 
with 



Therefore, 



with 



Q k J +1 (e k ) + (^Q k+l {9 k ) 1 9 J 



9 k ) + ^{0^ - e k ^ n2 



2- 



VQ k+1 (9 k ) = 2 (A,J(^)) T (A,S(^>S +1 + A i£C ) +1 - y) 



is a second order approximation of Qj +1 at 9 k . The positive semi-definite matrix 



(2.8) 



H(0*) = 2 (A,J(0*)) T (A,J(0*)) , 



can be viewed as an approximation of the Hessian of the function Q^ +1 - To update the transformation 
parameters, we chose to minimize this quadratic approximation to which we add another quadratic 
term that ensures a decrease of the objective function. We take as next estimate of the transformation 
parameters 

(2.9) 9 k+1 = argmin <VQj +1 (^), 6j - 6*) + \ (6, - 9 k , [H(0*) + TX e \ q ] (9, - 6*)) , 
where \ q £ R qxq is the identity matrix, Ae > 0, and i is the smallest positive integer such that 



gfe+i^fc+i) ^ gfe+i( fe) + (yQ k+1 {9 k ),9 k+l - 9 k ) 



(2.10) 



- (9 k+1 - 9 k , [H(9 k ) + (T - l)\ e \ q ] (9 k+1 - 9 k )) . 



The exist ence of such an i is discussed in the next section. Let us remark that if the minimiza- 
tion (2.9) was performed over all W instead of 6^, the solution would be 9 k+1 = 9 k - [H(9 k ) + 
2 4 A £ (l ( j] _1 VQ* :+1 (0|), as in a Newton method with Hessian V\(9 k ) + 2 l \ \ q . One can also note that 
if we had chosen V\(9 k ) — then 9 k+1 would be obtained with a simple projected gradient update. 
We noticed however that the Newton-like update (2.9 1, which can, for example, be solved using the 
algorithms presented in [8], yields more accurate results. 

To check that our choice of new transformation parameters decreases the value of the objective 
function L, one just has to combine (2.9) and (2.10) to realize that 



Q k+1 (0 



nfe+l 

~2" j 



9 k 



fen 2 
ill 2 



1 



Q k+1 (9 k ) + (VQ k+ \9 k ), 9 - 9 k ) + - (9, - 9 k , [H(9 k ) + TX e \ q ] (9, - 9 k )) , 
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for any 6j £ Qj. Choosing 8j — 6 k in the last inequality, multiplying it by k, and summing all the 
inequalities obtained for j from 1 to I yields 

(2.11) L(x k +\ 6 k+1 ) + - 6 k \\l < L(x k+1 ,9 k ). 

Therefore L(x k+1 , 9 k+1 ) sC L(x k+1 ,O k ) with a decrease of KA e ||0 fc+1 - k \\l/2, at least. 

3. Convergence analysis 

In this section, we analyze the convergence of the optimization method described in Section [2] We 
first present sufficient conditions under which the generated sequence converges to a critical point 
of L. Then, we describe several cases where these conditions are satisfied. In particular, we study 
different types of geometric transformations which meet these requirements. Finally, we discuss the 
influence of the stepsizc parameters on the final reconstruction quality. 

Algorithm 1 

Inputs: measurements y € R lm , tight frame U' € E^ +1 ^™ x ^ +1 - lp , regularization parameter n > 0, 
stepsizes (A^ ) feeN , A# > 0, and bounds 6^6. 

Initializations: set k = 0, a; = E M^ 1 '™, and 6° e 6. 

repeat 

1) Set 



x 



k+1 <- argmin L(x, 6 k ) + h„ (W(x - x k )) . 



k 



c6 R(l + l)« 2 



2) For all j = 1,...,/, set 



k+1 <- argmin (VQ k+1 (O k ), 0, 9 k ) + - (9 j 9 k , [H(8 k ) + T\ e \] {6, - 6 k )) , 



where Q k+1 is defined in (2.7), H(0^) is defined in (2.8), and i is the smallest positive integer 



such that (2.10) is satisfied. 
3) k <r- k + 1 
until convergence or k ^ fc max 

Outputs: Estimated images x* and transformation parameters 6* 



3.1. General convergence result 

Let us consider Algorithm 1 which is a summary of the optimization method described in the previous 
section. We are now in position to state our main convergence result. 



Theorem 1. Let L be the objective function defined in (2.4) with n > 0. Assume that L is bounded 
below, that the entries of Sj, with j = I,..., I, are twice continuously differentiable, that 6 
K (i+i)«x(;+i)p satis fi es \i/\|/t = \ {l+1)n , and that the stepsizes satisfy < A < A^,A e sC A for all 
k E N. Then, the sequence of estimates (x k ,6 k )i : ^ generated by Algorithm 1 is correctly defined and 
the following statements hold. 



1) Forallk^O, 



(3.12) 



L(x k+ \ e k+1 ) + 



x 



\e k+1 ~e k \\\ 



h f ,(W(x k+1 -x k )) 



^ L(x k ,9 k ). 



(3.13) 



Hence L(x k ,6 k ) does not increase. 
2) The sequences (x k+1 — x k )k^ti and (0 k+1 — 9 k )k^n converge. Indeed, 

k+1 -x k \\ 2 + \\9 k+1 -e ki 



lim 1 1 a::" 

k— > +OO 



! ||2=0. 
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3) Assume that L has the Kurdyca-Lojasiewicz property (see Definition 3.2 in [31). Then, if the 
sequence (x is bounded, the sequence (x ,0 )k£f$ converges to a critical point (x*,0*) of 

L. 



Proof. Let us start by showing that the sequence generated by Algorithm 1 is well-defined. For the 
first step of the algorithm, note that, for all (x, 0) £ x R lq and A > 0, x i— > L(x, 8) is a proper 

lower semicontinuous convex function which is bounded below and that x H ► A/i M (U' T (x — x))/2 
is a proper lower semicontinuous convex function which is coercive. Therefore, the function x i— > 
L(x,9) + Xh^( } i' 1 (x — x))/2 has a minimizer (Corollary 11.15, 17). For the second step of the 



algorithm, remark that, for all (xa,Xj) £ 



the function Qj : 0j H> \\AjS(0j)x o + AjXj — 



is C 1 with Lipschitz continuous gradient on the closed and bounded convex set Qj. Let Aj be this 
Lipschitz constant, whose value a priori depends on (xo,Xj). Using the descent lemma (Lemma 
3.1, [i]), we have 

Qiie)) < Qj(0j) + v()jo-::.o] - el) + ^ 



1^ 



32 11 2 



for any two points (0j, 0?) £ 0|. This proves the existence of an integer i such that (2.10) is satisfied 
at each second step of Algorithm 1. The existence of a minimizer in Problem (2.9) is then proved 
by using, e.g., Proposition 11.14 in 7. An induction finally shows that sequence (x k ,6 k )kefii is 

well-defined. 

Inequality (3.12) follows by combination of (2.6) and (2.11) and by using the fact that A^, \q ^ A 



for all k £ N. Then, summing this inequality from k — to K yields 



A 



K 
fc=0 



fc+i 







fen 2 
2 



As L is bounded below and L(x°,0°) — 
inequality is also bounded. Consequently, 



hp(V T (x k+1 - x k )) <: L{x°,e°)-L(x K+ \e K+1 ) 

it is clear that the righthand side of the previous 



+00 



(W(x 



k+1 



x k )) 



jfc+1 



ife||2 



< +OO, 



and, using the definition of and the fact that U/vJ/t = | 



0+1) 



the second point of the theorem holds. 



The proof of the third point make use of results established by Attouch et al. in [4] and can be 
found in Appendix. □ 

The third point of Theorem [l] applies if L has the Kurdyca-Lojasiewicz property. As explained 
in j3j, this property is satisfied by several types of functions including semi- algebraic ones. A function 
g : R n — > M U {+00} is semi-algebraic if its graph {(x,t) £ M. n x E : g(x) — t} is a semi-algebraic 
subset of K™ +1 , i.e., if it can be written as a finite union of sets of the form {x £ K n+1 : Pi(x) = 
0, Qi(x) < 0, i = 1, . . . , r}, where pi and qi are polynomials. The ^i-norm is thus an example of such 
a function. Note that the indicator function of a semi-algebraic set is semi-algebraic, and that the set 
of semi-algebraic functions is stable under basics operations such as sums, products or compositions 
(Section 4.3, [3]). 

In the definition of the objective function L, the set is semi-algebraic. Consequently, L satisfies 
the Kurdyca-Lojasiewicz property if, e.g., / and the fidelity term ||A(0)cc — y\\\ are semi- algebraic. 
For /, one can use, for example, f(x) = \\<t>x\\i with any real matrix <J>. This choice ensures that L 
is bounded below. Also note that if <t> has full column rank, then, using condition (3.12), one can 
show that the sequence (x^ken is bounded. For the fidelity term, its semi-algebraicity depends on 
the properties of the generating function ip and the type of geometric transformations. In the next 
section, we present examples of functions tp and geometric transformations such that this term is 
semi- algebraic and such that the entries of the matrices S^, with j = 1, . . . , I, are twice continuously 
diffcrcntiable, as required by Theorem [l] 
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3.2. Examples of admissible geometric transformations 

Let us start by studying the simple case of translations. This type of transformation can be represented 
by 2 parameters 9 = (9 2 ) T € and a function tq defined as 

t (u) = (ui + 01, u 2 + e 2 y. 



Combined, for example, with the cubic interpolator defined in 18 as generating function ip(u) 
4>(u\)4>(u 2 ) where 

(3.14) 0(«) = 



u 



■\\u\ 



0, 



_ 5 
2 

3,5 
' 2 



if ^ \u\ < 1, 
if 1 s$ \u\ < 2, 
otherwise, 



the entries of the matrix Sj are twice continuously differentiable with respect to the transformation 
parameters, and the function (0,x) H> ||A(0)a: — y||§, is semi-algebraic. Theorem [l] thus applies in 
this situation. Let us remark that this choice of generating function allows us to have sparse matrices 
Sj, with j = 1, . . . , I, with very fast implementation. Note that, instead of the cubic interpolator, we 



could also have chosen any B-splinc interpolators of degree larger than 3 40 



Then, with the same choice of generating function, we can also handle the case of affine transfor- 



mations, 
with 



Indeed, such transformations can be represented by 6 parameters 9 



(3.15) Tg(u) = {6 x u x + 6 2 u 2 + 3 , 04Ui + 9 5 u 2 + 6 ) T , 

and the requirements of Theorem [l] are also met. 

Next, let us show how to handle the case of homographies. An homography is usually represented 
using 8 parameters 9 = (6i, . . . 9 s y € K 8 with rg satisfying 

_ / Oim + e 2 u 2 + 6> 3 e 4Ul + e 5 u 2 + e 6 y 

Te[U) \ e 7 u x + 9 8 u 2 + 1 ' 6 7 ui + 6 s u 2 + 1 J ' 

Unfortunately, the function (9,x) M> \\A(9)x — y\\ 2 is not semi-algebraic in this case. However, if 
\67U1 + 0gu 2 1 <C 1, the transformation function can be approximated by 

< 3 -»» «<">-( SsffiS)* 1 -**-*"")- 

using a first order Taylor expansion. All the conditions are now fulfilled to apply Theorem [l] Note 
that the condition \O7U1 + s u 2 \ <C 1 can be enforced during the reconstruction by using appropriate 
bounds 9 and 9. 

Finally, let us mention that (small) rotations can be handled in the same manner by using a Taylor 
expansion of the sine and cosine functions. 



3.3. Influence of the stepsizes 

Let us highlight that, even though the sequence (x k , 9 k )keN generated by the algorithm converges to a 
critical point of L for all strictly positive stepsize Xg and strictly positive bounded sequence (Ak) fceN , 
different choices of stepsizes yield different results. The choice of these parameters is thus important. 

For the sequence of stepsizes (X^keN, we start from large values and thus constrain x h+1 to stay 
"close" to x h when the estimated images are still inaccurate. Starting with large values is also essential 
to have the first estimated images made of wavelets at coarse scales only. Then, we slightly decrease 
the stepsize value at each iteration as the estimates become more and more accurate. In all the 
following experiments, we use the same sequence of stepsizes: = max(0.9 fe (20k), 0.1). 

Finally, the stepsize Xg seems to have less influence on the final reconstruction quality, and we keep 
it fixed at 0.1 in all the experiments of the next section. 
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Figure 1. Top panels: first 5 observed images yi,...,ys. Middle panels: registered background images 
S(0*)xq, with j = 1, ... ,5. Bottom panels: estimated foreground images x\, . . . ,x%. 




4. Experiments 

In this section, we demonstrate, experimentally, the efficiency of the algorithm for three different 
applications: robust image alignment, compressed sensing, and super-resolution. The code and data 
needed to reproduce the results presented in this section are available at http : / /lts2w ww . epf 1 ■ ch/| 
people/gilles/ softwares . 

4.1. Robust image alignment 

For this first experiment, we test our algorithm for robust image alignment, as studied in 24 , 25 1 . This 
problem consists in aligning several images of a given scene despite the presence of occlusions, as, for 
example, the images presented in the top panel^] of Fig. [T] 

In 24025 , this problem is solved with a method based on low rank and sparse approximation. To get 



around the difficulty of optimizing the transformations parameters that appear in their data fidelity 
term, the authors propose to linearize this term with respect to the transformation parameters. This 
approximation is valid for small transformations only but the optimization problem becomes convex 
and easier to solve. Then, to align images with large transformations between them, the authors 
propose to repeatedly solve this convex problem and linearize the data fidelity term around the new 
estimated parameters to improve, little by little, the registration. 



Here, we solve the robust image alignment problem using the measurement model (1.2) and Algo- 
rithm 1 to minimize (2.4). In the measurement model, the matrices Ai,..., A; are set to the identity 
l„ G M" xn , and yi,...,yi contain the occluded observed images. Ideally our algorithm should es- 
timate the background image x , i.e., the wall and the windows in Fig. [T] extract the occlusions in 
Xi, . . . , xi, i.e., the branches, and provide the transformation parameters 0\, . . . , 0;, which align the 
background image on the observed images. 



Dataset available at http://perception.csl.illinois.edu/matrix-rank/rasl.html 
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Figure 2. Left panel: superposition of the 10 observed images. Right panel: superposition of the 10 observed 
images after registration with the estimated parameters 0* , , 0* o . 
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Figure 3. Sequence of estimated background images Xq +1 at k — (left), k = 10 (middle left), k = 20 (middle 
right), and at convergence (right). 



In our experiment, we use Z = 10 occluded images of n = 256 x 256 pixels. The first 5 images 
are presented in the top panels of Fig. [T] We then run Algorithm 1 with k = 100, Xg = 0.1, 
A* = max(0.9 fc (20k), 0.1), and /i = 10~ 10 . The matrix is build using the Haar wavelet basis, 
and f(x) is set to ||U' T a;||i. Finally, we assume that the transformations between images are well 
modeled by homographies and use model (3.161 for Tg. The parameters are constrained as follows: 
|1 - 0i | , |1 - 0s | < 0.2, \9 2 \ , \0i\ ^ 0.2, \0 3 \ , \9 6 \ < 20, \0 r \ , \9 S \ ^ 0.01, with m and u 2 belonging to 
{-127, -126,..., 128}. 

The aligned background images S(9*)xq, with j — 1, . . . , 5, and the estimated foreground images 
x , . . . , x ^ are presented in Fig. [T] One can already notice that the separation background-foreground 
is very accurate. In particular, the background image does not contain any object initially occluding 
the scene. To have a better visualization of the quality of the estimated transformations parameters, 
we present in Fig.[2]the superposition of the 10 observed images before and after registration with the 
estimated parameters. One can easily remark that the registration is also very accurate. 

In Fig. pi we present the evolution of estimated background image x^ +1 at iterations k = 0, k = 10, 
k = 20, k = 30, and at convergence. This sequence of images illustrate the fact that, as explained in 
Section [2~2} the images are reconstructed in a coarse-to-fine scales fashion. 



For comparison, one can check in 24 25 the performance reached by the low-rank and sparse 



approximation technique on the same dataset. Both methods provide very similar reconstructions. 



Note, however, that no guarantee about the convergence of the method described in 24|25 is provided 



4.2. Compressed sensing 



For this second experiment, we study the performance of our algorithm in a compressed sensing 
setting, and compare it with other common algorithms. 
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Figure 4. Top panels: 5 initial images. Middle-top panels: reconstructed images with method 
Middle-bottom panels: reconstructed images with method J^.iffi . Bottom panels: reconstructed images 
with the proposed method. 
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Figure 5. Left panel: Reconstruction SNR as a function of m/n. Right panel: Reconstruction SNR as a 
function of the noise level a. All curves represents the mean SNR over the 30 x 5 reconstructed images. The 
vertical lines shows the variation of the SNR at one standard deviation. The dashed black, continuous blue, 
and dot-dashed red curves represent the reconstruction SNR for method (^.i7| ), f^.iffi , and our algorithm, 
respectively. 
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Figure 6. Left panel: superposition of the 5 initial images without registration. Middle panel: superposition 
of the 5 initial images after registration with the estimated parameters 9% , . . . , <?J . Right panel: estimated 
background image Xq. 



We use I — 5 different image^] of the same scene to generate 5 different measurement vectors 
yi, . . . ,yi. These images, presented in the top panels of Fig. |3J contain n — 128 x 128 pixels. Note 
that parts of the scene are sometimes occluded. The measurement vectors are obtained using the 
spread spectrum technique of |28| , which consists of pre-modulating each image with a random ±1 
sequence before randomly selecting in Fourier coefficients. Let c— [c,\, . . . , c n ) T € K™ be a Rademacher 
sequence, F S c« x ™ b e the matrix that implements the 2D discrete Fourier transform, and G N m 
be a set of m indices selected independently and uniformly at random in {1, . . . , n}. As we are dealing 
with images living in M™, we can restrict the selection of the Fourier coefficients to one half of the 
Fourier plane. We denote F <E c«/ 2x ™ restriction of F to this half plane. The observation matrices 
satisfy 

A 1 = ... = A ; =A=^~|^ Ce R mx ", 

where F r ,P £ R™/ 2x ™ are respectively the real and imaginary part of F, C <E R nxn is the diagonal 
matrix containing the sequence c, and (-)q restricts annxu matrix to its rows indexed by Q. Let 
us remark that, unlike here, one can choose a different subset f2 for each matrix Aj to introduce more 
innovations in the set of acquired measurements and to benefit even more of the correlation between 
the acquired signals when using a joint reconstruction technique like ours. 

As in the last experiment, we assume that the transformations between images can be modeled 



by homographies and use model (3.16) for Tg. For the prior term /, we assume that the images are 
sparse in the Haar wavelet basis W € W lXn . We build the block-diagonal matrix W by repeating I + 1 
times the matrix W on the diagonal, and set f(x) = ||U ,T a;|| 1 . In all simulations, we use Xg = 0.1, 
A* = max(0.9 fc (20k), 0.1), /i = 10~ 10 , and the following constraints apply on the transformations 
parameters: |1 - 6 X \ , |1 - 5 | 0.2, \0 2 \ , |0 4 | «S 0.2, |0 3 | , |0 6 | sC 20, |0 7 | , |0 8 | < 0.01, with m and u 2 
belonging to {—63, . . . , 64}. 

The first set of simulations study the performance of Algorithm 1 in the absence of noise for different 
number of measurements: m € {O.ln, 0.2n, 0.3n, 0.4n, 0.5n, 0.7n}. The second set of simulations study 
the performance of Algorithm 1 at m = 0.3n for different noise levels. The noise vector n follows an i.i.d 
zero-mean Gaussian distribution of standard deviation a £ {0.01, 0.0177, 0.0316, 0.0562, 0.1, 0.177, 0.316}. 
The squared i^-norm of the noise vector n/a follows a chi-square distribution with Im degrees of 
freedom and we compute the bound on the noise level ||n|j| ^ e 2 using the 99 th percentile of this 
distribution. Then, k is set to 100 in the noiseless case and is chosen so that ||A(0*)a;* — y\\ 2 ~ e in 
the presence of nois^j 

For comparison, we also reconstruct the images by solving the following convex optimization prob- 
lems: 

(4.17) min||W T X||i subject to || Y - AX|| F sC e, 

^castle-R20 dataset available at cvlab. epf 1 ■ ch/-strecha/multiview/rawMVS .html| [34| . 

^For each noise level, we use one simulation to estimate a value of K such that, at converge of the algorithm, 
1 1 A(0* ) tc * — y\\2 G [0.99e, l.Ole]. We then use the same value of k for all subsequent simulations. 
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and 



(4.18) 



where X = (aji, ...,xi) € I 
, i 3 \, and ||X|| 2 ,i 



min || W T X|| 2i i subject to 

vnxl 



|Y- AX|| F e, 



En sr^l 
i=l 2-rj=l 



Y = (y 1 ,...,y t ) e R mxl , \\ ■ \\ F is the Frobenius norm, ||X||i = 

,-=i l x ijl) an d Il^ll2,i = Si"=i(Sj=i a 'ii) 1/ ' 2 ! w --h ccjj the entry on the i th line and j th column 
of X. In the first problem, the images aji, . . . , X; are reconstructed with a sparse prior in the Haar 
wavelet basis. This is an extension of the Basis Pursuit problem, advocated by the compressed sensing 
theory [12) , to multiple signals. In the second problem, the prior term favors reconstructions which 
have a similar sparsity pattern in the Haar wavelet basis [5j, therefore imposing some correlation 
between images. 

Fig. [H] shows the quality of the reconstructed images in term of SNR. Let (x*,0*) € x R ln 

be the images and transformation parameters recovered with our algorithm and x £ M. ln be the initial 
I images. The reconstruction SNR satisfies — 20 log 10 (\\S(6*)xQ + x* — XjHa/H&jlh)- F° r the two 
other methods (4.17) and (4.18), the reconstruction SNR satisfies — 20 log 10 (||a;* — sCj lla/ll 35 ^* where 
X* = (xj, . . . , x"l) G IR nx/ are the reconstructed images with these methods. The first graph of Fig. [5] 
shows the reconstructions SNR as a function of m/n in the absence of noise, and the second graph 
shows the reconstructions SNR as a function of the noise level a for m/n = 0.3. In each case, we 
perform 30 simulations with independent noise realization and choice of f2. The curves show the mean 
SNR over the 30 x 5 reconstructed images. The vertical lines indicate the variation of the SNR at 
one standard deviation. One can deduce from these curves that the best reconstructions are obtained 
with our method. 

In Fig. [4j we present the ground truth images as well as reconstructions obtained with the three 
different methods for m/n = 0.3 in the absence of noise. On can remark that the reconstructions 
obtained with our algorithm exhibits much finer details. Fig.[6]shows the estimated background image, 
next to the superposed initial images before and after registration with the estimated parameters. One 
can also note that the estimated background image is free of occlusions, and that the initial images 
are better aligned after registration with the estimated parameters. 



4.3. Super-resolution 

For this last experiment, we test our algorithm for super-resolution from multiple frames, as considered 
in, e.g., 



1G 



41 



We use I = 16 low-resolution image a^with m = 64 x 64 pixels extracted from a video sequence and 
try to increase the resolution by 2 in both directions, i.e., n = 128 x 128. 



To increase the resolution of these images, we use the measurement model (1.2) where the sensing 
matrices Aj, with j = 1, . . . , /, model a blurring and downsampling operator that averages all the pixels 
in a rectangular window of 2 x 2 pixels and uniformly downsamples the resulting blurred image by a 
factor of 2 in both dimensions. The sensing matrices are identical for all j. The vectors y±, . . . , yi are 
initialized with the initial low-resolution images. The transformations between images are assumed to 
be affine, and we use model (3.15) for tq. Then, we assume that the h igh- resolution images Xq, .. . ,xi 
are sparse in the gradient domain and set f(x) — Y^j=o 9( x j) m ( 2.4 ) , where g: W l —> M is the 
anisotropic Total Variation (TV) norm. Let us recall that the vector xa contains the samples Xj(uk), 
with k = 1, . . . , n and Uk = {u\ , u|) (see Section 1.1 ). The anisotropic TV norm g satisfies 



9i x i) = J~] \V 1 x j (u k )\ + \V 2 Xj(u k )\ , 



k=l 



where 



Vi Xj(u k ) = Xj{u\ +1 ,ul) 



x 3 {u\,ul), 



3 Dataset available at http://users.soe.ucsc.edu/~milanfar/software/sr-datasets.html (Credit: Peyman 



Milanfar). 
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Figure 7. Top panels: first 5 input images upsampled independently by a factor 2 in both dimension using 
a bicubic interpolation. Middle panels: first 5 reconstructed high resolution images S(0*)xq + x* , with 
j — 1, ... ,5, with our algorithm. Bottom panels: first 5 reconstructed foreground images x*, . . . , x%. 










5®u 



Figure 8. Left panel: superposition of the 16 input images upsampled independently by a factor 2 in both 
dimension using a bicubic interpolation. Middle panel: superposition of the same 16 images after registration 
with the estimated parameters 61, ... , 0l a . Right panel: estimated background image Xq at high-resolution. 



assuming that a;(it™ +1 , ■) = x(- ,ii2 +1 ) = 0. Note that g and, consequently, / are semi- algebraic. 
Finally, we run Algorithm 1 to minimize Kty with k = 100, \ g = 0.1, A£ = max(0.9 fe (20k), 0.1), 
and n = 10~ 10 . We also constrain the transformation parameters as follows: |1 — #x| > |1 — ^ 0.2, 
\9 2 \ , |6» 4 | < 0.2, |0 3 | , \9 6 \ 64, with u x and u 2 belonging to {-63,. . . ,64}. 

In Fig. [7J we present the first 5 super-resolved images with 2 different techniques. The top panels 
show the image upsampled independently using a bicubic spline interpolation, and the middle panels 
show the super-resolved images using our method, which exploits the inter-correlation between images. 
Note that the brand of the car and license plate number become readable with our reconstructions. We 
also present, in the bottom panels, the first 5 reconstructed foreground images which, here, essentially 
correct slight registration errors for the first reconstructed image. 

In Fig. [8j we present the estimated background image alone, and the input images superposed 
before and after registration. One can notice that the images are correctly aligned after registration. 



Image reconstruction from multi-view measurements 



17 



5. Conclusion 

We presented a method to solve multi-view imaging problems where one has to reconstruct several im- 
ages of a scene from only a few linear observations made at different viewpoints. Each observed image 
is modeled as the sum of a geometrically transformed background image and of a foreground image, 
modeling possible occlusions. We considered here global geometric transformations represented by 
few parameters, such as translations, afhne transformations, or homographies. The proposed recon- 
struction method jointly estimates the images (background and foreground) and the transformations 
parameters by minimizing a non-convex functional. We studied the convergence of the proposed al- 
gorithm and showed that the generated sequence of estimates converges to a critical point of the 
non-convex functional for commonly used priors and transformation models. 

An obvious limitation of our technique arises when the geometric transformations of the background 
image between two different observations are not global but local. This situation for example occurs 
when, unlike the images used in this paper, the depth of the observed scene is not constant. In this 
case, the geometric transformations differ from one pixel location to another. However, the following 
two properties of the proposed method may help to compensate for such an issue. First, one should 
note that the presence of foreground images in the measurement model can render our method robust 
to a small transformation model mismatch by considering as "occlusions" the parts of the scene that 
cannot be aligned with the assumed model. Second, let us also remark that the requirements of 
Theorem [T] hold for a large class of transformation models tq and leave us the possibility to choose 
more general type of transformations. For example, we could approximate elastic transformations 
using a polynomial of the spatial coordinates (see, e.g., [20]) and estimate the coefficients of this 
polynomial using the proposed algorithm. 

Finally, we would like to conclude by highlighting the potential interest of the proposed technique 
in free breathing coronary magnetic resonance imaging (MRI) [ 33] . One of the major challenges in 
this application is to handle respiratory motion properly. The low acquisition speed in MRI forces 
researchers to invent new techniques to compensate for the inevitable motions occurring during the 
acquisition. To avoid motions due to heart contractions, an ECG signal is acquired to ensure that the 
Fourier measurements are always taken at the same instant in the heart cycle. A few measurements 
are then taken at each cycle. As the number of measurements acquired during one cycle is too 
small to accurately reconstruct an image of the heart, one should combine the measurements of 
several cycles to gather enough information. It is thus mandatory to properly compensate for the 
respiratory motion occurring between heart beats in the reconstruction method. Let us remark that 
simple geometric transformations, such as translations, are already sufficient to reach good image 



quality 11 . Consequently, as the technique developed in this paper automatically compensates for 



such geometric transformations, it is a promising method for such an application. 



Appendix A 

To establish the third point of Theorem [T] we need the following theorem establish by Attouch et al. 
in|. 

Theorem 2 (Theorem 6.2, El). Let p be a positive integer larger than 2, x — (xi, . . . , x p ) be a vector 
in% = R ni x ... x R n p, and g: K -> M U {+00} a function of the form g(x) — Q(x) + Ya=i 9i( x t): 
where Q : % — > M is a C 1 function with locally Lipschitz continuous gradient and gi : K ni — > RU{+oo} 
is a proper lower semicontinuous function, i — l,...,p. Assume that g is bounded from below and 
satisfies the Kurdyca-Lojasiewicz property. Let (x )k£j$ &nd (v k )keN be sequences such that 



g t (xr l ) + Q(x$ 



2. 5 X i 1 • • • 1 X p 1 1 0Ml2'« 



9i( x i ) ~^ Q( x l~^~ ) • * * ) x i—l 1 x i i ■ • ■ 1 x p)t 

(A.19) ^ fc+1 e %(^' +1 ), 



\v* +1 + d Xi Q(x k 1 +\...,x*+\ X l ° +1 ,...,x k p ) 

fe+l „kl 



^ 6|K +i -x 



2. 
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where i ranges over {1, . . . ,p} and a, b > 0. Then, if the sequence (x k )k£ti is bounded, it converges to 
some critical point x* of g. 



The objective function L denned in (2.4) is equal to 

L(x, 8) = k Q(x, 9) + f(x) + i e (0), 

where Q : R(' +1 )" x R lq -> K satisfies Q(a;,0) = ||A(0)cc-y|||. The objective function has the the form 
required by Theorem[2] Furthermore, / and i© j are proper lower semicontinuous functions, and, using 
the assumptions of Theorem [T] Q is twice continuously differentiable and thus have locally Lipschitz 
continuous gradient. We also note that L is bounded below and satisfies the Kurdyca-Lojasiewicz 



property. To prove the third point of Theorem [I 



Algorithm 1 is bounded and satisfies conditions (A. 19) 



it remains to show that the sequence generated by 



First, due to the constraints that applies on 9, it is obvious that the sequence (9 k ) keN is bounded. 
Therefore, assuming that (x k )keN is bounded, as in the third point of Theorem]!] is enough to ensure 
the boundedness condition of the entire sequence of estimates. 

Second, ||a; fe+1 — x k \\2 tends to as k — > +oo, as established in the second point of Theorem [IJ 
Therefore there exists k G N such that for all k ^ k , h fl (^ J (x k+1 -x k )) = \\W (x k+1 -x k )U/(2fj,) = 
\\x k+1 — x k \\ 2 / (2fj,), as WW 1 = I (/+i)„. Let k be larger than kg in the following. Inequality (2.6 ) shows 
that 



k+l\ 



k Q(x k+ \e 



- *\x k+1 



x k \\i ^ f{x k ) + n Q(x k ,e k 



Furthermore, the first order optimality condition of Problem (2.5) shows that there exists v x £ 
df{x k+1 ) such that 



,fc+i 



k d x Q(x k +\0 



(x k+1 -x k )=0, 



2/i 



which implies, 



fc+i 



A 



d x Q{x k+ \e k )\\ 2 < — \\ x 



k+1 



x%. 



Finally, we deduce from equation (2.11) that 

2 1 



i (O k+1 ) + K Q(x k+L ,9 k+L ) + 



k+1 ok+l\ 



e k+1 -e k \\l ^i e (o k ) + K Q(x k+1 ,o k ). 



And, using the first order optimality condition of the problems (2.9), we conclude that there exists 
G di Q] {9 k+1 ) such that 



,fc+i 



k VQ k+1 (9 k ) + k [H(0*) + 2 l \ e \ q ](9 k+1 - 9 k ) = 



for all j — Let T denote the Lipschitz constant of VQ on a product of balls B x x Bg 

containing the sequence (xk, Ok)keN- By construction of i in Algorithm 1, we necessarily have 2 l Xg 
T] = max(2r, Xg). Let us also remark that, on the same products of balls, their exists A such that the 
singular values of the matrices Y\{9 k ), j — are bounded above by A. Therefore, 



4+ l +n VQ k+1 (9 k+1 ) 



\V e . -TK Wj' \?j ■ )\\2 

= + nVQ k {9 k ) + nVQ k+1 {9 k+1 ) - nVQ k+1 {9 3 



3 

KVQ k+1 (9 k )\\ 2 + K\\VQ k+1 (9 k+1 ) - VQ k+1 (9 k )\\ 2 



< k(A + V ) \\9 k+1 - 9 k \\ 2 + K T\\9 k+1 - 9 k \ 



n(A + r] + T) \\9 



k+l 



which yields, 



.fc+i 



Kd e Q(x k+L ,9 k+i )\\ 2 < k (A + ^ + T) \\9 



k+i 
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with v > k+1 = ((^+ 1 ) T , . . . , (^' i +1 ) T ) T G di e (9 k+1 ). The sequence (x k , 6 k ) k>ko thus satisfies conditions 
(A. 19 1 with a = min(A/(4/i), nXg/2) and b = max(A/(2/i), k(A + -q + V)). This terminates the proof 
of the third point of Theorem [T] 
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